Generative Adversarial Network-Based Postfilter for STFT Spectrograms
نویسندگان
چکیده
We propose a learning-based postfilter to reconstruct the high-fidelity spectral texture in short-term Fourier transform (STFT) spectrograms. In speech-processing systems, such as speech synthesis, conversion, enhancement, separation, and coding, STFT spectrograms have been widely used as key acoustic representations. In these tasks, we normally need to precisely generate or predict the representations from inputs; however, generated spectra typically lack the fine structures that are close to those of the true data. To overcome these limitations and reconstruct spectra having finer structures, we propose a generative adversarial network (GAN)-based postfilter that is implicitly optimized to match the true feature distribution in adversarial learning. The challenge with this postfilter is that a GAN cannot be easily trained for very high-dimensional data such as STFT spectra. We take a simple divide-and-concatenate strategy. Namely, we first divide the spectrograms into multiple frequency bands with overlap, reconstruct the individual bands using the GAN-based postfilter trained for each band, and finally connect the bands with overlap. We tested our proposed postfilter on a deep neural network-based text-to-speech task and confirmed that it was able to reduce the gap between synthesized and target spectra, even in the high-dimensional STFT domain.
منابع مشابه
Automatic Colorization of Grayscale Images Using Generative Adversarial Networks
Automatic colorization of gray scale images poses a unique challenge in Information Retrieval. The goal of this field is to colorize images which have lost some color channels (such as the RGB channels or the AB channels in the LAB color space) while only having the brightness channel available, which is usually the case in a vast array of old photos and portraits. Having the ability to coloriz...
متن کاملConditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification
Improving speech system performance in noisy environments remains a challenging task, and speech enhancement (SE) is one of the effective techniques to solve the problem. Motivated by the promising results of generative adversarial networks (GANs) in a variety of image processing tasks, we explore the potential of conditional GANs (cGANs) for SE, and in particular, we make use of the image proc...
متن کاملImprovement of generative adversarial networks for automatic text-to-image generation
This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...
متن کاملCorrelation of STFT Spectrograms Applied to the Voice Deal Function in Mobile Phones
For the past five decades there has been a substantial increase in the development of computer applications based on the STFT (Short-Time Fourier Transform) spectrograms. Amongst others, a relevant example of this kind of implementation is speech analysis. Indeed, if the audio signal is correctly filtered, it is possible to use STFT spectrogram techniques to recognize speech. This work investig...
متن کاملSSGAN: Secure Steganography Based on Generative Adversarial Networks
In this paper, a novel strategy of Secure Steganograpy based on Generative Adversarial Networks is proposed to generate suitable and secure covers for steganography. The proposed architecture has one generative network, and two discriminative networks. The generative network mainly evaluates the visual quality of the generated images for steganography, and the discriminative networks are utiliz...
متن کامل